Mining and Ranking Generators of Sequential Patterns

نویسندگان

  • David Lo
  • Siau-Cheng Khoo
  • Jinyan Li
چکیده

Sequential pattern mining first proposed by Agrawal and Srikant has received intensive research due to its wide range applicability in many real-life domains. Various improvements have been proposed which include mining a closed set of sequential patterns. Sequential patterns supported by the same sequences in the database can be considered as belonging to an equivalence class. Each equivalence class contains patterns partially-ordered by sub-sequence relationship and having the same support. Within an equivalence class, the set of maximal and minimal patterns are referred to as closed patterns and generators respectively. Generators used together with closed patterns can provide additional information which closed patterns alone are not able to provide. Also, as generators are the minimal members, they are preferable over closed patterns for model selection and classification based on the Minimum Description Length (MDL) principle. Several algorithms have been proposed for mining closed sequential patterns, but none so far for mining sequential generators. This paper fills this research gap by investigating properties of sequential generators and proposing an algorithm to efficiently mine sequential generators. The algorithm works on a three-step process of search space compaction, non-generator pruning and a final filtering step. We also introduce ranking of mined generators and propose mining of a unique generator per equivalence class. Performance study has been conducted on various synthetic and real benchmark datasets. They show that mining generators can be as fast as mining closed patterns even at low support

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

VGEN: Fast Vertical Mining of Sequential Generator Patterns

Sequential pattern mining is a popular data mining task with wide applications. However, the set of all sequential patterns can be very large. To discover fewer but more representative patterns, several compact representations of sequential patterns have been studied. The set of sequential generators is one the most popular representations. It was shown to provide higher accuracy for classifica...

متن کامل

Ranking Sequential Patterns with Respect to Significance

We present a reliable universal method for ranking sequential patterns (itemset-sequences) with respect to significance in the problem of frequent sequential pattern mining. We approach the problem by first building a probabilistic reference model for the collection of itemsetsequences and then deriving an analytical formula for the frequency for sequential patterns in the reference model. We r...

متن کامل

Towards an Efficient Ranking of Interval-Based Patterns

Almost all activities observed in nowadays applications are correlated with a timing sequence. Users are mainly looking for interesting sequences out of such data. Sequential pattern mining algorithms aim at finding frequent sequences. Usually, the mined activities have timing durations that represent time intervals between their starting and ending points. Most sequential pattern mining approa...

متن کامل

A Single-scan Algorithm for Mining Sequential Patterns from Data Streams

Sequential pattern mining (SPAM) is one of the most interesting research issues of data mining. In this paper, a new research problem of mining data streams for sequential patterns is defined. A data stream is an unbound sequence of data elements arriving at a rapid rate. Based on the characteristics of data streams, the problem complexity of mining data streams for sequential patterns is more ...

متن کامل

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Sequential pattern mining is the process of applying data mining techniques to a sequential database, to extract frequent subsequences to discover correlation that exists among the ordered list of events. Web Usage mining (WUM) discovers and extracts interesting knowledge/patterns from Web logs is one of the applications of Sequential Pattern Mining. In this paper, we present a survey of the se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008